Overview

Dataset statistics

Number of variables15
Number of observations1005348
Missing cells1118932
Missing cells (%)7.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory360.9 MiB
Average record size in memory376.5 B

Variable types

Categorical9
Numeric6

Alerts

BerRating is highly correlated with CO2RatingHigh correlation
GroundFloorArea(sq m) is highly correlated with TotalDeliveredEnergyHigh correlation
CO2Rating is highly correlated with BerRating and 1 other fieldsHigh correlation
TotalDeliveredEnergy is highly correlated with GroundFloorArea(sq m) and 1 other fieldsHigh correlation
BerRating is highly correlated with CO2Rating and 1 other fieldsHigh correlation
CO2Rating is highly correlated with BerRating and 1 other fieldsHigh correlation
TotalDeliveredEnergy is highly correlated with BerRating and 1 other fieldsHigh correlation
BerRating is highly correlated with CO2RatingHigh correlation
CO2Rating is highly correlated with BerRatingHigh correlation
MainWaterHeatingFuel is highly correlated with MainSpaceHeatingFuelHigh correlation
MainSpaceHeatingFuel is highly correlated with MainWaterHeatingFuelHigh correlation
YearofConstruction is highly correlated with EnergyRatingHigh correlation
EnergyRating is highly correlated with YearofConstruction and 2 other fieldsHigh correlation
BerRating is highly correlated with CO2Rating and 1 other fieldsHigh correlation
CO2Rating is highly correlated with BerRating and 1 other fieldsHigh correlation
MainSpaceHeatingFuel is highly correlated with MainWaterHeatingFuelHigh correlation
MainWaterHeatingFuel is highly correlated with MainSpaceHeatingFuelHigh correlation
VentilationMethod is highly correlated with EnergyRatingHigh correlation
InsulationType is highly correlated with EnergyRatingHigh correlation
TotalDeliveredEnergy is highly correlated with BerRating and 1 other fieldsHigh correlation
MainSpaceHeatingFuel has 14110 (1.4%) missing values Missing
MainWaterHeatingFuel has 14110 (1.4%) missing values Missing
StructureType has 72276 (7.2%) missing values Missing
InsulationType has 221093 (22.0%) missing values Missing
InsulationThickness has 221093 (22.0%) missing values Missing
TotalDeliveredEnergy has 570448 (56.7%) missing values Missing
BerRating is highly skewed (γ1 = 52.94408732) Skewed
CO2Rating is highly skewed (γ1 = 69.10763716) Skewed
TotalDeliveredEnergy is highly skewed (γ1 = 87.3140669) Skewed
InsulationThickness has 122900 (12.2%) zeros Zeros

Reproduction

Analysis started2022-08-06 11:11:14.290084
Analysis finished2022-08-06 11:11:50.801926
Duration36.51 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

CountyName
Categorical

Distinct26
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size984.6 KiB
Dublin
298379 
Cork
113668 
Galway
54664 
Kildare
 
44498
Limerick
 
43389
Other values (21)
450750 

Length

Max length9
Median length8
Mean length6.118182958
Min length4

Characters and Unicode

Total characters6150903
Distinct characters34
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDonegal
2nd rowKildare
3rd rowDublin
4th rowDublin
5th rowDublin

Common Values

ValueCountFrequency (%)
Dublin298379
29.7%
Cork113668
 
11.3%
Galway54664
 
5.4%
Kildare44498
 
4.4%
Limerick43389
 
4.3%
Meath38687
 
3.8%
Wexford33527
 
3.3%
Tipperary32036
 
3.2%
Kerry31871
 
3.2%
Donegal31259
 
3.1%
Other values (16)283370
28.2%

Length

2022-08-06T12:11:50.830035image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
dublin298379
29.7%
cork113668
 
11.3%
galway54664
 
5.4%
kildare44498
 
4.4%
limerick43389
 
4.3%
meath38687
 
3.8%
wexford33527
 
3.3%
tipperary32036
 
3.2%
kerry31871
 
3.2%
donegal31259
 
3.1%
Other values (16)283370
28.2%

Most occurring characters

ValueCountFrequency (%)
i554431
 
9.0%
l542786
 
8.8%
r471170
 
7.7%
a444420
 
7.2%
n417852
 
6.8%
o398954
 
6.5%
e367646
 
6.0%
D329638
 
5.4%
u327232
 
5.3%
b298379
 
4.9%
Other values (24)1998395
32.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5145555
83.7%
Uppercase Letter1005348
 
16.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i554431
10.8%
l542786
10.5%
r471170
9.2%
a444420
 
8.6%
n417852
 
8.1%
o398954
 
7.8%
e367646
 
7.1%
u327232
 
6.4%
b298379
 
5.8%
k203681
 
4.0%
Other values (13)1119004
21.7%
Uppercase Letter
ValueCountFrequency (%)
D329638
32.8%
C167180
16.6%
W108411
 
10.8%
L103970
 
10.3%
K92868
 
9.2%
M75617
 
7.5%
G54664
 
5.4%
T32036
 
3.2%
S15288
 
1.5%
O13452
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
Latin6150903
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i554431
 
9.0%
l542786
 
8.8%
r471170
 
7.7%
a444420
 
7.2%
n417852
 
6.8%
o398954
 
6.5%
e367646
 
6.0%
D329638
 
5.4%
u327232
 
5.3%
b298379
 
4.9%
Other values (24)1998395
32.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII6150903
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i554431
 
9.0%
l542786
 
8.8%
r471170
 
7.7%
a444420
 
7.2%
n417852
 
6.8%
o398954
 
6.5%
e367646
 
6.0%
D329638
 
5.4%
u327232
 
5.3%
b298379
 
4.9%
Other values (24)1998395
32.5%
Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size983.0 KiB
Detached house
291545 
Semi-detached house
272815 
Mid-terrace house
140621 
End of terrace house
77554 
Mid-floor apartment
65189 
Other values (6)
157624 

Length

Max length22
Median length20
Mean length16.91869084
Min length5

Characters and Unicode

Total characters17009172
Distinct characters29
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDetached house
2nd rowDetached house
3rd rowSemi-detached house
4th rowSemi-detached house
5th rowSemi-detached house

Common Values

ValueCountFrequency (%)
Detached house291545
29.0%
Semi-detached house272815
27.1%
Mid-terrace house140621
14.0%
End of terrace house77554
 
7.7%
Mid-floor apartment65189
 
6.5%
Top-floor apartment56080
 
5.6%
Ground-floor apartment54161
 
5.4%
House33288
 
3.3%
Maisonette10920
 
1.1%
Apartment2856
 
0.3%

Length

2022-08-06T12:11:50.876918image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
house815823
38.5%
detached291545
 
13.8%
semi-detached272815
 
12.9%
apartment178286
 
8.4%
mid-terrace140621
 
6.6%
end77554
 
3.7%
of77554
 
3.7%
terrace77554
 
3.7%
mid-floor65189
 
3.1%
top-floor56080
 
2.6%
Other values (4)65719
 
3.1%

Most occurring characters

ValueCountFrequency (%)
e2854791
16.8%
o1365398
 
8.0%
h1346895
 
7.9%
d1174700
 
6.9%
t1161266
 
6.8%
a1147490
 
6.7%
1113392
 
6.5%
u869984
 
5.1%
r844227
 
5.0%
s827062
 
4.9%
Other values (19)4303967
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter14301247
84.1%
Space Separator1113392
 
6.5%
Uppercase Letter1005667
 
5.9%
Dash Punctuation588866
 
3.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e2854791
20.0%
o1365398
9.5%
h1346895
9.4%
d1174700
8.2%
t1161266
8.1%
a1147490
8.0%
u869984
 
6.1%
r844227
 
5.9%
s827062
 
5.8%
c782535
 
5.5%
Other values (8)1926899
13.5%
Uppercase Letter
ValueCountFrequency (%)
D291864
29.0%
S272815
27.1%
M216730
21.6%
E77554
 
7.7%
T56080
 
5.6%
G54161
 
5.4%
H33288
 
3.3%
A2856
 
0.3%
B319
 
< 0.1%
Space Separator
ValueCountFrequency (%)
1113392
100.0%
Dash Punctuation
ValueCountFrequency (%)
-588866
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin15306914
90.0%
Common1702258
 
10.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2854791
18.7%
o1365398
8.9%
h1346895
8.8%
d1174700
7.7%
t1161266
 
7.6%
a1147490
 
7.5%
u869984
 
5.7%
r844227
 
5.5%
s827062
 
5.4%
c782535
 
5.1%
Other values (17)2932566
19.2%
Common
ValueCountFrequency (%)
1113392
65.4%
-588866
34.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII17009172
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e2854791
16.8%
o1365398
 
8.0%
h1346895
 
7.9%
d1174700
 
6.9%
t1161266
 
6.8%
a1147490
 
6.7%
1113392
 
6.5%
u869984
 
5.1%
r844227
 
5.0%
s827062
 
4.9%
Other values (19)4303967
25.3%

YearofConstruction
Real number (ℝ≥0)

HIGH CORRELATION

Distinct261
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1982.779243
Minimum1753
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.7 MiB
2022-08-06T12:11:50.926119image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1753
5-th percentile1900
Q11972
median1996
Q32005
95-th percentile2018
Maximum2022
Range269
Interquartile range (IQR)33

Descriptive statistics

Standard deviation33.9271398
Coefficient of variation (CV)0.01711090124
Kurtosis3.820401463
Mean1982.779243
Median Absolute Deviation (MAD)12
Skewness-1.793886072
Sum1993383146
Variance1151.050815
MonotonicityNot monotonic
2022-08-06T12:11:50.980053image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
200649227
 
4.9%
200447123
 
4.7%
200545820
 
4.6%
200338902
 
3.9%
200736824
 
3.7%
200232444
 
3.2%
190030916
 
3.1%
200029095
 
2.9%
200125248
 
2.5%
199824029
 
2.4%
Other values (251)645720
64.2%
ValueCountFrequency (%)
175314
 
< 0.1%
17571
 
< 0.1%
17593
 
< 0.1%
1760223
< 0.1%
17618
 
< 0.1%
17621
 
< 0.1%
17641
 
< 0.1%
17655
 
< 0.1%
17661
 
< 0.1%
17672
 
< 0.1%
ValueCountFrequency (%)
20224870
 
0.5%
202110807
1.1%
202014814
1.5%
201918371
1.8%
201813715
1.4%
201710543
1.0%
20167770
0.8%
20154870
 
0.5%
20143208
 
0.3%
20132179
 
0.2%

EnergyRating
Categorical

HIGH CORRELATION

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size983.3 KiB
C2
124530 
C3
118185 
D1
114352 
C1
113782 
D2
98187 
Other values (10)
436312 

Length

Max length2
Median length2
Mean length1.887439971
Min length1

Characters and Unicode

Total characters1897534
Distinct characters10
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC2
2nd rowB3
3rd rowC3
4th rowC2
5th rowD2

Common Values

ValueCountFrequency (%)
C2124530
12.4%
C3118185
11.8%
D1114352
11.4%
C1113782
11.3%
D298187
9.8%
B377970
7.8%
G66815
6.6%
E156631
5.6%
A351194
 
5.1%
F46347
 
4.6%
Other values (5)137355
13.7%

Length

2022-08-06T12:11:51.028127image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c2124530
12.4%
c3118185
11.8%
d1114352
11.4%
c1113782
11.3%
d298187
9.8%
b377970
7.8%
g66815
6.6%
e156631
5.6%
a351194
 
5.1%
f46347
 
4.6%
Other values (5)137355
13.7%

Most occurring characters

ValueCountFrequency (%)
C356497
18.8%
2343553
18.1%
1301284
15.9%
3247349
13.0%
D212539
11.2%
B126045
 
6.6%
E101411
 
5.3%
A95694
 
5.0%
G66815
 
3.5%
F46347
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1005348
53.0%
Decimal Number892186
47.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C356497
35.5%
D212539
21.1%
B126045
 
12.5%
E101411
 
10.1%
A95694
 
9.5%
G66815
 
6.6%
F46347
 
4.6%
Decimal Number
ValueCountFrequency (%)
2343553
38.5%
1301284
33.8%
3247349
27.7%

Most occurring scripts

ValueCountFrequency (%)
Latin1005348
53.0%
Common892186
47.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
C356497
35.5%
D212539
21.1%
B126045
 
12.5%
E101411
 
10.1%
A95694
 
9.5%
G66815
 
6.6%
F46347
 
4.6%
Common
ValueCountFrequency (%)
2343553
38.5%
1301284
33.8%
3247349
27.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII1897534
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C356497
18.8%
2343553
18.1%
1301284
15.9%
3247349
13.0%
D212539
11.2%
B126045
 
6.6%
E101411
 
5.3%
A95694
 
5.0%
G66815
 
3.5%
F46347
 
2.4%

BerRating
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct77722
Distinct (%)7.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean238.9152415
Minimum-158.42
Maximum56423.71
Zeros0
Zeros (%)0.0%
Negative161
Negative (%)< 0.1%
Memory size7.7 MiB
2022-08-06T12:11:51.072838image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-158.42
5-th percentile51.89
Q1158.08
median209.93
Q3285.29
95-th percentile497.57
Maximum56423.71
Range56582.13
Interquartile range (IQR)127.21

Descriptive statistics

Standard deviation173.5624388
Coefficient of variation (CV)0.7264603034
Kurtosis13616.76436
Mean238.9152415
Median Absolute Deviation (MAD)60.88
Skewness52.94408732
Sum240192960.2
Variance30123.92017
MonotonicityNot monotonic
2022-08-06T12:11:51.120832image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
224.56102
 
< 0.1%
224.8793
 
< 0.1%
224.8593
 
< 0.1%
224.6389
 
< 0.1%
224.8287
 
< 0.1%
224.9587
 
< 0.1%
174.8187
 
< 0.1%
224.8686
 
< 0.1%
174.9286
 
< 0.1%
174.7986
 
< 0.1%
Other values (77712)1004452
99.9%
ValueCountFrequency (%)
-158.421
< 0.1%
-97.371
< 0.1%
-63.961
< 0.1%
-60.971
< 0.1%
-56.061
< 0.1%
-49.161
< 0.1%
-48.011
< 0.1%
-45.321
< 0.1%
-44.661
< 0.1%
-43.641
< 0.1%
ValueCountFrequency (%)
56423.711
< 0.1%
32134.941
< 0.1%
31623.331
< 0.1%
21725.621
< 0.1%
18771.311
< 0.1%
13914.781
< 0.1%
11823.781
< 0.1%
11476.291
< 0.1%
9892.941
< 0.1%
9183.171
< 0.1%

GroundFloorArea(sq m)
Real number (ℝ≥0)

HIGH CORRELATION

Distinct43114
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean114.4886734
Minimum5.47
Maximum3546.11
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.7 MiB
2022-08-06T12:11:51.182816image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum5.47
5-th percentile47.66
Q177.9
median100.3
Q3134.48
95-th percentile231.21
Maximum3546.11
Range3540.64
Interquartile range (IQR)56.58

Descriptive statistics

Standard deviation60.4344775
Coefficient of variation (CV)0.5278642481
Kurtosis35.93359276
Mean114.4886734
Median Absolute Deviation (MAD)26.34
Skewness2.823807777
Sum115100958.8
Variance3652.326071
MonotonicityNot monotonic
2022-08-06T12:11:51.358756image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
811324
 
0.1%
801139
 
0.1%
84964
 
0.1%
90960
 
0.1%
82952
 
0.1%
78783
 
0.1%
88774
 
0.1%
70747
 
0.1%
85734
 
0.1%
108694
 
0.1%
Other values (43104)996277
99.1%
ValueCountFrequency (%)
5.471
< 0.1%
6.71
< 0.1%
7.211
< 0.1%
7.261
< 0.1%
7.471
< 0.1%
7.71
< 0.1%
7.911
< 0.1%
7.961
< 0.1%
8.31
< 0.1%
8.311
< 0.1%
ValueCountFrequency (%)
3546.111
< 0.1%
3229.391
< 0.1%
2331.921
< 0.1%
2011.251
< 0.1%
1825.991
< 0.1%
1788.61
< 0.1%
1705.021
< 0.1%
1625.341
< 0.1%
15931
< 0.1%
1572.511
< 0.1%

CO2Rating
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct29089
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean55.64251746
Minimum-88.57
Maximum18417.1
Zeros1
Zeros (%)< 0.1%
Negative185
Negative (%)< 0.1%
Memory size7.7 MiB
2022-08-06T12:11:51.411905image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-88.57
5-th percentile9.91
Q133.55
median46.98
Q365.91
95-th percentile124.63
Maximum18417.1
Range18505.67
Interquartile range (IQR)32.36

Descriptive statistics

Standard deviation48.95306043
Coefficient of variation (CV)0.8797779588
Kurtosis22151.18262
Mean55.64251746
Median Absolute Deviation (MAD)15.44
Skewness69.10763716
Sum55940093.64
Variance2396.402125
MonotonicityNot monotonic
2022-08-06T12:11:51.461783image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
41.39236
 
< 0.1%
43.4233
 
< 0.1%
44.12232
 
< 0.1%
36.44231
 
< 0.1%
41.44228
 
< 0.1%
38.77227
 
< 0.1%
40.98226
 
< 0.1%
42.08226
 
< 0.1%
37.44226
 
< 0.1%
41.45226
 
< 0.1%
Other values (29079)1003057
99.8%
ValueCountFrequency (%)
-88.571
< 0.1%
-27.981
< 0.1%
-23.821
< 0.1%
-20.251
< 0.1%
-17.511
< 0.1%
-16.021
< 0.1%
-14.491
< 0.1%
-11.991
< 0.1%
-11.221
< 0.1%
-11.021
< 0.1%
ValueCountFrequency (%)
18417.11
< 0.1%
105411
< 0.1%
5840.251
< 0.1%
4019.21
< 0.1%
3467.311
< 0.1%
3327.241
< 0.1%
3283.771
< 0.1%
2822.291
< 0.1%
2817.231
< 0.1%
2760.171
< 0.1%

MainSpaceHeatingFuel
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct20
Distinct (%)< 0.1%
Missing14110
Missing (%)1.4%
Memory size64.5 MiB
Mains Gas
382337 
Heating Oil
364334 
Electricity
185279 
Solid Multi-Fuel
 
30996
Bulk LPG (propane or butane)
 
13573
Other values (15)
 
14719

Length

Max length30
Median length11
Mean length10.74701232
Min length8

Characters and Unicode

Total characters10652847
Distinct characters42
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowHeating Oil
2nd rowHeating Oil
3rd rowMains Gas
4th rowMains Gas
5th rowMains Gas

Common Values

ValueCountFrequency (%)
Mains Gas382337
38.0%
Heating Oil364334
36.2%
Electricity185279
18.4%
Solid Multi-Fuel30996
 
3.1%
Bulk LPG (propane or butane)13573
 
1.4%
Manufactured Smokeless Fuel6614
 
0.7%
House Coal3120
 
0.3%
Wood Pellets (bulk supply for1321
 
0.1%
Sod Peat1221
 
0.1%
Bottled LPG1099
 
0.1%
Other values (10)1344
 
0.1%
(Missing)14110
 
1.4%

Length

2022-08-06T12:11:51.511220image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mains382337
20.7%
gas382337
20.7%
heating364334
19.7%
oil364334
19.7%
electricity185369
10.0%
solid30996
 
1.7%
multi-fuel30996
 
1.7%
bulk14894
 
0.8%
lpg14672
 
0.8%
butane13573
 
0.7%
Other values (34)65605
 
3.5%

Most occurring characters

ValueCountFrequency (%)
i1544425
14.5%
a1174393
11.0%
858209
 
8.1%
t792255
 
7.4%
s785310
 
7.4%
n780742
 
7.3%
l679417
 
6.4%
e643982
 
6.0%
M419947
 
3.9%
G397009
 
3.7%
Other values (32)2577158
24.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7870541
73.9%
Uppercase Letter1864230
 
17.5%
Space Separator858209
 
8.1%
Dash Punctuation31194
 
0.3%
Open Punctuation15100
 
0.1%
Close Punctuation13573
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i1544425
19.6%
a1174393
14.9%
t792255
10.1%
s785310
10.0%
n780742
9.9%
l679417
8.6%
e643982
8.2%
c377622
 
4.8%
g365252
 
4.6%
r221036
 
2.8%
Other values (12)506107
 
6.4%
Uppercase Letter
ValueCountFrequency (%)
M419947
22.5%
G397009
21.3%
H367454
19.7%
O364388
19.5%
E185369
9.9%
S38867
 
2.1%
F37610
 
2.0%
P17689
 
0.9%
L15330
 
0.8%
B14946
 
0.8%
Other values (6)5621
 
0.3%
Space Separator
ValueCountFrequency (%)
858209
100.0%
Dash Punctuation
ValueCountFrequency (%)
-31194
100.0%
Open Punctuation
ValueCountFrequency (%)
(15100
100.0%
Close Punctuation
ValueCountFrequency (%)
)13573
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9734771
91.4%
Common918076
 
8.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
i1544425
15.9%
a1174393
12.1%
t792255
 
8.1%
s785310
 
8.1%
n780742
 
8.0%
l679417
 
7.0%
e643982
 
6.6%
M419947
 
4.3%
G397009
 
4.1%
c377622
 
3.9%
Other values (28)2139669
22.0%
Common
ValueCountFrequency (%)
858209
93.5%
-31194
 
3.4%
(15100
 
1.6%
)13573
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII10652847
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i1544425
14.5%
a1174393
11.0%
858209
 
8.1%
t792255
 
7.4%
s785310
 
7.4%
n780742
 
7.3%
l679417
 
6.4%
e643982
 
6.0%
M419947
 
3.9%
G397009
 
3.7%
Other values (32)2577158
24.2%

MainWaterHeatingFuel
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct21
Distinct (%)< 0.1%
Missing14110
Missing (%)1.4%
Memory size64.5 MiB
Mains Gas
380320 
Heating Oil
361268 
Electricity
193666 
Solid Multi-Fuel
 
28848
Bulk LPG (propane or butane)
 
13520
Other values (16)
 
13616

Length

Max length30
Median length11
Mean length10.72271644
Min length4

Characters and Unicode

Total characters10628764
Distinct characters42
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowHeating Oil
2nd rowHeating Oil
3rd rowMains Gas
4th rowMains Gas
5th rowMains Gas

Common Values

ValueCountFrequency (%)
Mains Gas380320
37.8%
Heating Oil361268
35.9%
Electricity193666
19.3%
Solid Multi-Fuel28848
 
2.9%
Bulk LPG (propane or butane)13520
 
1.3%
Manufactured Smokeless Fuel5604
 
0.6%
House Coal2959
 
0.3%
Wood Pellets (bulk supply for1271
 
0.1%
Sod Peat1244
 
0.1%
Bottled LPG1204
 
0.1%
Other values (11)1334
 
0.1%
(Missing)14110
 
1.4%

Length

2022-08-06T12:11:51.552307image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mains380320
20.7%
gas380320
20.7%
heating361268
19.6%
oil361268
19.6%
electricity193757
10.5%
solid28848
 
1.6%
multi-fuel28848
 
1.6%
bulk14791
 
0.8%
lpg14724
 
0.8%
butane13520
 
0.7%
Other values (35)62126
 
3.4%

Most occurring characters

ValueCountFrequency (%)
i1548811
14.6%
a1165086
11.0%
848552
 
8.0%
t803001
 
7.6%
s779006
 
7.3%
n774603
 
7.3%
l676008
 
6.4%
e642964
 
6.0%
M414772
 
3.9%
G395044
 
3.7%
Other values (32)2580917
24.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7869913
74.0%
Uppercase Letter1852748
 
17.4%
Space Separator848552
 
8.0%
Dash Punctuation29021
 
0.3%
Open Punctuation15010
 
0.1%
Close Punctuation13520
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i1548811
19.7%
a1165086
14.8%
t803001
10.2%
s779006
9.9%
n774603
9.8%
l676008
8.6%
e642964
8.2%
c393424
 
5.0%
g362123
 
4.6%
r228326
 
2.9%
Other values (12)496561
 
6.3%
Uppercase Letter
ValueCountFrequency (%)
M414772
22.4%
G395044
21.3%
H364227
19.7%
O361309
19.5%
E193757
10.5%
S35746
 
1.9%
F34452
 
1.9%
P17724
 
1.0%
L15319
 
0.8%
B15003
 
0.8%
Other values (6)5395
 
0.3%
Space Separator
ValueCountFrequency (%)
848552
100.0%
Dash Punctuation
ValueCountFrequency (%)
-29021
100.0%
Open Punctuation
ValueCountFrequency (%)
(15010
100.0%
Close Punctuation
ValueCountFrequency (%)
)13520
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9722661
91.5%
Common906103
 
8.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
i1548811
15.9%
a1165086
12.0%
t803001
 
8.3%
s779006
 
8.0%
n774603
 
8.0%
l676008
 
7.0%
e642964
 
6.6%
M414772
 
4.3%
G395044
 
4.1%
c393424
 
4.0%
Other values (28)2129942
21.9%
Common
ValueCountFrequency (%)
848552
93.6%
-29021
 
3.2%
(15010
 
1.7%
)13520
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII10628764
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i1548811
14.6%
a1165086
11.0%
848552
 
8.0%
t803001
 
7.6%
s779006
 
7.3%
n774603
 
7.3%
l676008
 
6.4%
e642964
 
6.0%
M414772
 
3.9%
G395044
 
3.7%
Other values (32)2580917
24.3%

VentilationMethod
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing2901
Missing (%)0.3%
Memory size67.8 MiB
Natural vent.
942726 
Bal.whole mech.vent heat recvr
 
31490
Whole house extract vent.
 
26036
Pos input vent.- loft
 
1378
Bal.whole mech.vent no heat re
 
429

Length

Max length30
Median length13
Mean length13.86822246
Min length13

Characters and Unicode

Total characters13902158
Distinct characters26
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNatural vent.
2nd rowNatural vent.
3rd rowNatural vent.
4th rowNatural vent.
5th rowNatural vent.

Common Values

ValueCountFrequency (%)
Natural vent.942726
93.8%
Bal.whole mech.vent heat recvr31490
 
3.1%
Whole house extract vent.26036
 
2.6%
Pos input vent.- loft1378
 
0.1%
Bal.whole mech.vent no heat re429
 
< 0.1%
Pos input vent.- outside388
 
< 0.1%
(Missing)2901
 
0.3%

Length

2022-08-06T12:11:51.594141image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-06T12:11:51.642729image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
vent970528
45.7%
natural942726
44.4%
bal.whole31919
 
1.5%
mech.vent31919
 
1.5%
heat31919
 
1.5%
recvr31490
 
1.5%
whole26036
 
1.2%
house26036
 
1.2%
extract26036
 
1.2%
pos1766
 
0.1%
Other values (5)4390
 
0.2%

Most occurring characters

ValueCountFrequency (%)
t2032696
14.6%
a1975326
14.2%
e1208619
8.7%
1122318
8.1%
.1034366
7.4%
l1033978
7.4%
v1033937
7.4%
r1032171
7.4%
n1004642
7.2%
u970916
7.0%
Other values (16)1453189
10.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter10741261
77.3%
Space Separator1122318
 
8.1%
Other Punctuation1034366
 
7.4%
Uppercase Letter1002447
 
7.2%
Dash Punctuation1766
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t2032696
18.9%
a1975326
18.4%
e1208619
11.3%
l1033978
9.6%
v1033937
9.6%
r1032171
9.6%
n1004642
9.4%
u970916
9.0%
h147829
 
1.4%
c89445
 
0.8%
Other values (9)211702
 
2.0%
Uppercase Letter
ValueCountFrequency (%)
N942726
94.0%
B31919
 
3.2%
W26036
 
2.6%
P1766
 
0.2%
Space Separator
ValueCountFrequency (%)
1122318
100.0%
Other Punctuation
ValueCountFrequency (%)
.1034366
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1766
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin11743708
84.5%
Common2158450
 
15.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
t2032696
17.3%
a1975326
16.8%
e1208619
10.3%
l1033978
8.8%
v1033937
8.8%
r1032171
8.8%
n1004642
8.6%
u970916
8.3%
N942726
8.0%
h147829
 
1.3%
Other values (13)360868
 
3.1%
Common
ValueCountFrequency (%)
1122318
52.0%
.1034366
47.9%
-1766
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII13902158
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t2032696
14.6%
a1975326
14.2%
e1208619
8.7%
1122318
8.1%
.1034366
7.4%
l1033978
7.4%
v1033937
7.4%
r1032171
7.4%
n1004642
7.2%
u970916
7.0%
Other values (16)1453189
10.5%

StructureType
Categorical

MISSING

Distinct3
Distinct (%)< 0.1%
Missing72276
Missing (%)7.2%
Memory size60.0 MiB
Masonry
867161 
Timber or Steel Frame
 
59584
Insulated Conctete Form
 
6327

Length

Max length23
Median length7
Mean length8.002503558
Min length7

Characters and Unicode

Total characters7466912
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMasonry
2nd rowMasonry
3rd rowMasonry
4th rowMasonry
5th rowMasonry

Common Values

ValueCountFrequency (%)
Masonry867161
86.3%
Timber or Steel Frame59584
 
5.9%
Insulated Conctete Form6327
 
0.6%
(Missing)72276
 
7.2%

Length

2022-08-06T12:11:51.687641image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-06T12:11:51.727669image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
masonry867161
77.1%
timber59584
 
5.3%
or59584
 
5.3%
steel59584
 
5.3%
frame59584
 
5.3%
insulated6327
 
0.6%
conctete6327
 
0.6%
form6327
 
0.6%

Most occurring characters

ValueCountFrequency (%)
r1052240
14.1%
o939399
12.6%
a933072
12.5%
n879815
11.8%
s873488
11.7%
M867161
11.6%
y867161
11.6%
e257317
 
3.4%
191406
 
2.6%
m125495
 
1.7%
Other values (12)480358
6.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6210612
83.2%
Uppercase Letter1064894
 
14.3%
Space Separator191406
 
2.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r1052240
16.9%
o939399
15.1%
a933072
15.0%
n879815
14.2%
s873488
14.1%
y867161
14.0%
e257317
 
4.1%
m125495
 
2.0%
t78565
 
1.3%
l65911
 
1.1%
Other values (5)138149
 
2.2%
Uppercase Letter
ValueCountFrequency (%)
M867161
81.4%
F65911
 
6.2%
T59584
 
5.6%
S59584
 
5.6%
I6327
 
0.6%
C6327
 
0.6%
Space Separator
ValueCountFrequency (%)
191406
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7275506
97.4%
Common191406
 
2.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
r1052240
14.5%
o939399
12.9%
a933072
12.8%
n879815
12.1%
s873488
12.0%
M867161
11.9%
y867161
11.9%
e257317
 
3.5%
m125495
 
1.7%
t78565
 
1.1%
Other values (11)401793
 
5.5%
Common
ValueCountFrequency (%)
191406
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII7466912
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r1052240
14.1%
o939399
12.6%
a933072
12.5%
n879815
11.8%
s873488
11.7%
M867161
11.6%
y867161
11.6%
e257317
 
3.4%
191406
 
2.6%
m125495
 
1.7%
Other values (12)480358
6.4%
Distinct5
Distinct (%)< 0.1%
Missing2901
Missing (%)0.3%
Memory size982.4 KiB
two
407774 
three
271434 
four
140400 
one
118539 
zero
64300 

Length

Max length5
Median length3
Mean length3.745743166
Min length3

Characters and Unicode

Total characters3754909
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowone
2nd rowtwo
3rd rowthree
4th rowtwo
5th rowtwo

Common Values

ValueCountFrequency (%)
two407774
40.6%
three271434
27.0%
four140400
 
14.0%
one118539
 
11.8%
zero64300
 
6.4%
(Missing)2901
 
0.3%

Length

2022-08-06T12:11:51.765589image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-06T12:11:51.811786image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
two407774
40.7%
three271434
27.1%
four140400
 
14.0%
one118539
 
11.8%
zero64300
 
6.4%

Most occurring characters

ValueCountFrequency (%)
o731013
19.5%
e725707
19.3%
t679208
18.1%
r476134
12.7%
w407774
10.9%
h271434
 
7.2%
f140400
 
3.7%
u140400
 
3.7%
n118539
 
3.2%
z64300
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3754909
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o731013
19.5%
e725707
19.3%
t679208
18.1%
r476134
12.7%
w407774
10.9%
h271434
 
7.2%
f140400
 
3.7%
u140400
 
3.7%
n118539
 
3.2%
z64300
 
1.7%

Most occurring scripts

ValueCountFrequency (%)
Latin3754909
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o731013
19.5%
e725707
19.3%
t679208
18.1%
r476134
12.7%
w407774
10.9%
h271434
 
7.2%
f140400
 
3.7%
u140400
 
3.7%
n118539
 
3.2%
z64300
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII3754909
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o731013
19.5%
e725707
19.3%
t679208
18.1%
r476134
12.7%
w407774
10.9%
h271434
 
7.2%
f140400
 
3.7%
u140400
 
3.7%
n118539
 
3.2%
z64300
 
1.7%

InsulationType
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing221093
Missing (%)22.0%
Memory size60.0 MiB
Factory Insulated
494467 
Loose Jacket
197854 
None
91934 

Length

Max length17
Median length17
Mean length14.21466615
Min length4

Characters and Unicode

Total characters11147923
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFactory Insulated
2nd rowFactory Insulated
3rd rowLoose Jacket
4th rowLoose Jacket
5th rowFactory Insulated

Common Values

ValueCountFrequency (%)
Factory Insulated494467
49.2%
Loose Jacket197854
19.7%
None91934
 
9.1%
(Missing)221093
22.0%

Length

2022-08-06T12:11:51.854904image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-06T12:11:51.897916image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
factory494467
33.5%
insulated494467
33.5%
loose197854
13.4%
jacket197854
13.4%
none91934
 
6.2%

Most occurring characters

ValueCountFrequency (%)
t1186788
 
10.6%
a1186788
 
10.6%
o982109
 
8.8%
e982109
 
8.8%
c692321
 
6.2%
692321
 
6.2%
s692321
 
6.2%
n586401
 
5.3%
u494467
 
4.4%
d494467
 
4.4%
Other values (9)3157831
28.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8979026
80.5%
Uppercase Letter1476576
 
13.2%
Space Separator692321
 
6.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t1186788
13.2%
a1186788
13.2%
o982109
10.9%
e982109
10.9%
c692321
7.7%
s692321
7.7%
n586401
6.5%
u494467
5.5%
d494467
5.5%
l494467
5.5%
Other values (3)1186788
13.2%
Uppercase Letter
ValueCountFrequency (%)
F494467
33.5%
I494467
33.5%
L197854
13.4%
J197854
13.4%
N91934
 
6.2%
Space Separator
ValueCountFrequency (%)
692321
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin10455602
93.8%
Common692321
 
6.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
t1186788
11.4%
a1186788
11.4%
o982109
 
9.4%
e982109
 
9.4%
c692321
 
6.6%
s692321
 
6.6%
n586401
 
5.6%
u494467
 
4.7%
d494467
 
4.7%
l494467
 
4.7%
Other values (8)2663364
25.5%
Common
ValueCountFrequency (%)
692321
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII11147923
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t1186788
 
10.6%
a1186788
 
10.6%
o982109
 
8.8%
e982109
 
8.8%
c692321
 
6.2%
692321
 
6.2%
s692321
 
6.2%
n586401
 
5.3%
u494467
 
4.4%
d494467
 
4.4%
Other values (9)3157831
28.3%

InsulationThickness
Real number (ℝ≥0)

MISSING
ZEROS

Distinct183
Distinct (%)< 0.1%
Missing221093
Missing (%)22.0%
Infinite0
Infinite (%)0.0%
Mean31.59133146
Minimum0
Maximum1872
Zeros122900
Zeros (%)12.2%
Negative0
Negative (%)0.0%
Memory size7.7 MiB
2022-08-06T12:11:51.942955image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q125
median30
Q340
95-th percentile80
Maximum1872
Range1872
Interquartile range (IQR)15

Descriptive statistics

Standard deviation22.94257629
Coefficient of variation (CV)0.7262301155
Kurtosis1133.880446
Mean31.59133146
Median Absolute Deviation (MAD)10
Skewness15.44642207
Sum24775659.65
Variance526.361807
MonotonicityNot monotonic
2022-08-06T12:11:51.992701image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30147632
14.7%
0122900
12.2%
25117613
11.7%
50105097
10.5%
3592410
9.2%
4062376
 
6.2%
2046638
 
4.6%
8034369
 
3.4%
6014115
 
1.4%
157712
 
0.8%
Other values (173)33393
 
3.3%
(Missing)221093
22.0%
ValueCountFrequency (%)
0122900
12.2%
1190
 
< 0.1%
1.7521
 
< 0.1%
1.791
 
< 0.1%
1.892
 
< 0.1%
1.915
 
< 0.1%
1.921
 
< 0.1%
262
 
< 0.1%
2.331
 
< 0.1%
2.352
 
< 0.1%
ValueCountFrequency (%)
187221
< 0.1%
8901
 
< 0.1%
8701
 
< 0.1%
8011
 
< 0.1%
8001
 
< 0.1%
6702
 
< 0.1%
6601
 
< 0.1%
6004
 
< 0.1%
5803
 
< 0.1%
5601
 
< 0.1%

TotalDeliveredEnergy
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
SKEWED

Distinct431407
Distinct (%)99.2%
Missing570448
Missing (%)56.7%
Infinite0
Infinite (%)0.0%
Mean24403.69201
Minimum-3929.793
Maximum5431169.676
Zeros0
Zeros (%)0.0%
Negative3
Negative (%)< 0.1%
Memory size7.7 MiB
2022-08-06T12:11:52.047170image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-3929.793
5-th percentile8305.22705
Q115190.84325
median21318.4585
Q329593.121
95-th percentile49214.57105
Maximum5431169.676
Range5435099.469
Interquartile range (IQR)14402.27775

Descriptive statistics

Standard deviation23488.45498
Coefficient of variation (CV)0.9624959602
Kurtosis14527.40373
Mean24403.69201
Median Absolute Deviation (MAD)6932.2075
Skewness87.3140669
Sum1.061316566 × 1010
Variance551707517.1
MonotonicityNot monotonic
2022-08-06T12:11:52.099528image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
184.69722
 
< 0.1%
641.71716
 
< 0.1%
184.87714
 
< 0.1%
541.26812
 
< 0.1%
919.12712
 
< 0.1%
477.23212
 
< 0.1%
292.04411
 
< 0.1%
99.90711
 
< 0.1%
992.33910
 
< 0.1%
368.2159
 
< 0.1%
Other values (431397)434771
43.2%
(Missing)570448
56.7%
ValueCountFrequency (%)
-3929.7931
< 0.1%
-2843.4781
< 0.1%
-1805.0471
< 0.1%
50.5631
< 0.1%
56.1811
< 0.1%
69.8531
< 0.1%
72.8771
< 0.1%
73.3731
< 0.1%
74.9981
< 0.1%
77.3441
< 0.1%
ValueCountFrequency (%)
5431169.6761
< 0.1%
4129347.1041
< 0.1%
3846582.6461
< 0.1%
3444097.2061
< 0.1%
3343274.7431
< 0.1%
3133044.0421
< 0.1%
2868451.1021
< 0.1%
2844373.551
< 0.1%
2370104.1931
< 0.1%
2050664.8451
< 0.1%

Interactions

2022-08-06T12:11:45.110533image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:40.765851image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:41.764355image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:42.618728image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:43.470782image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:44.381815image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:45.197330image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:40.961848image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:41.896833image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:42.766777image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:43.603733image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:44.511322image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:45.284954image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:41.141706image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:42.046608image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:42.912375image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:43.744745image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:44.639179image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:45.377585image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:41.318746image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:42.193657image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:43.059736image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:43.880429image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:44.772139image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:45.467134image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:41.474750image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:42.324290image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:43.192160image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:44.004159image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:44.906995image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:45.552552image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:41.589167image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:42.415128image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:43.289513image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:44.098683image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-06T12:11:45.004684image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-08-06T12:11:52.141288image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-06T12:11:52.201935image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-06T12:11:52.261295image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-06T12:11:52.324508image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-08-06T12:11:52.397836image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-08-06T12:11:46.423232image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-06T12:11:47.382657image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-08-06T12:11:49.778349image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-08-06T12:11:50.332863image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

CountyNameDwellingTypeDescrYearofConstructionEnergyRatingBerRatingGroundFloorArea(sq m)CO2RatingMainSpaceHeatingFuelMainWaterHeatingFuelVentilationMethodStructureTypeNoOfSidesShelteredInsulationTypeInsulationThicknessTotalDeliveredEnergy
0DonegalDetached house1997C2180.01171.1945.53Heating OilHeating OilNatural vent.MasonryoneFactory Insulated20.0025474.52
1KildareDetached house2010B3137.56242.9335.66Heating OilHeating OilNatural vent.MasonrytwoFactory Insulated50.0027654.47
2DublinSemi-detached house1999C3223.6199.3844.65Mains GasMains GasNatural vent.MasonrythreeLoose Jacket20.0017000.04
3DublinSemi-detached house1965C2196.99138.4137.83Mains GasMains GasNatural vent.MasonrytwoNaNNaN22708.48
4DublinSemi-detached house1985D2260.52127.1655.07Mains GasMains GasNatural vent.MasonrytwoLoose Jacket100.0028182.86
5DonegalHouse1975D1248.0088.5762.68Heating OilHeating OilNatural vent.MasonrytwoFactory Insulated0.0018470.03
6DublinSemi-detached house1985D2275.9773.5458.21Mains GasMains GasNatural vent.MasonrytwoLoose Jacket100.0017227.86
7LimerickSemi-detached house1960D1244.7189.5459.79Heating OilHeating OilNatural vent.MasonrythreeLoose Jacket80.0016711.95
8KerryHouse1973D2293.82157.6271.30Heating OilHeating OilNatural vent.MasonryoneLoose Jacket50.0040212.93
9KilkennyDetached house1980D2299.9691.4475.47Heating OilHeating OilNatural vent.MasonryoneLoose Jacket20.0021839.38

Last rows

CountyNameDwellingTypeDescrYearofConstructionEnergyRatingBerRatingGroundFloorArea(sq m)CO2RatingMainSpaceHeatingFuelMainWaterHeatingFuelVentilationMethodStructureTypeNoOfSidesShelteredInsulationTypeInsulationThicknessTotalDeliveredEnergy
1005338DublinTop-floor apartment2022A246.9077.008.82NaNNaNBal.whole mech.vent heat recvrMasonrytwoNaNNaNNaN
1005339DublinGround-floor apartment2004D2281.3473.5055.32ElectricityElectricityNatural vent.MasonrythreeNaNNaNNaN
1005340DublinMid-floor apartment2020A355.2337.8010.86ElectricityElectricityNaNNaNNaNNaNNaNNaN
1005341DublinMid-floor apartment2020A246.3250.899.11ElectricityElectricityNaNNaNNaNNaNNaNNaN
1005342DublinMid-floor apartment2020A238.6086.587.59ElectricityElectricityNaNNaNNaNNaNNaNNaN
1005343DonegalDetached house1982D2282.58214.1870.89Heating OilHeating OilNatural vent.MasonryoneNaNNaN52927.53
1005344DublinMid-terrace house1900G998.1499.77317.99Manufactured Smokeless FuelElectricityNatural vent.MasonryfourNaNNaNNaN
1005345DublinMid-floor apartment2021A237.2681.327.33NaNNaNWhole house extract vent.MasonrytwoNaNNaNNaN
1005346DublinMid-floor apartment2022A236.0582.096.76NaNNaNBal.whole mech.vent heat recvrMasonryfourNaNNaNNaN
1005347MonaghanDetached house2013B190.57334.3519.66ElectricityElectricityBal.whole mech.vent heat recvrTimber or Steel FrameoneNaNNaNNaN